Evaluation of Fast K-nearest Neighbors Search Methods Using Real Data Sets
نویسنده
چکیده
The problem of k-nearest neighbors (kNN) search is to find nearest k neighbors from a given data set for a query point. To speed up the finding process of nearest k neighbors, many fast kNN search algorithms were proposed. The performance of fast kNN search algorithms is highly influenced by the number of dimensions, number of data points, and data distribution of a data set. In the extreme case, the performance of a fast kNN search algorithm may be poorer than the full search algorithm. To help understand the performance of fast kNN search algorithms on data sets of different types, five fast algorithms were tested in this paper using multiple real data sets. The experimental results of the paper will be very useful in choosing a fast kNN search algorithm for an unknown data set.
منابع مشابه
A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملAdaptive k-Nearest-Neighbor Classification Using a Dynamic Number of Nearest Neighbors
Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high accuracy in classification is given in advance and is highly dependent on the data set used. If the size of data set is large, the sequential or binary search of NNs is inapplicable due to the increased computational ...
متن کاملFast k-NN classification rule using metric on space-filling curves
A fast nearest neighbor algorithm for pattern classiication is proposed and tested on real data. The patterns (points in d-dimensional Euclidean space) are sorted along a space-lling curve. This way the multidi-mensional problem is compressed to the simplest case of the nearest neighbor search in one dimension.
متن کاملComparison and evaluation of the performance of data-driven models for estimating suspended sediment downstream of Doroodzan Dam
Dams control most of the sediment entering the reservoir by creating static environments. However, sediment leaving the dam depends on various factors such as dam management method, inlet sediment, water height in the reservoir, the shape of the reservoir, and discharge flow. In this research, the amount of suspended sediment of Doroodzan Dam based on a statistical period of 25 years has been i...
متن کاملDiagnosis of Heart Disease Using Binary Grasshopper Optimization Algorithm and K-Nearest Neighbors
Introduction: The heart is one of the main organs of the human body, and its unhealthiness is an important factor in human mortality. Heart disease may be asymptomatic, but medical tests can predict and diagnose it. Diagnosis of heart disease requires extensive experience of specialist physicians. The aim of this study is to help physicians diagnose heart disease based on hybrid Binary Grasshop...
متن کامل